An Automated Text Document Classification Framework using BERT

نویسندگان

چکیده

Due to the rapid advancement of technology, volume online text data from numerous various disciplines is increasing significantly over time. Therefore, more work needed create systems that can effectively classify in accordance with its content, facilitating processing and extraction crucial information. Since these non-automated use manual feature classification, which error-prone time-consuming by choosing best appropriate algorithms for traditional procedures are typically resource intensive (computational, human, etc.), not a viable solution. To address shortcomings approaches, we offer unique categorization strategy based on well-known DL algorithm called BERT. The proposed framework trained tested using cutting-edge datasets, such as UCI email dataset, includes spam non-spam emails, BBC News multiple categories tech, sports, politics, business, entertainment. system achieved highest accuracy 91.4% be used different organizations text-based high performance. effectiveness evaluated evaluation metrics Accuracy, Precision, Recall.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhanced Information Retrieval from Narrative German-language Clinical Text Documents using Automated Document Classification

The amount of narrative clinical text documents stored in Electronic Patient Records (EPR) of Hospital Information Systems is increasing. Physicians spend a lot of time finding relevant patient-related information for medical decision making in these clinical text documents. Thus, efficient and topical retrieval of relevant patient-related information is an important task in an EPR system. This...

متن کامل

Feature Selection Technique for Text Document Classification: An Alternative Approach

Text classification and feature selection plays an important role for correctly identifying the documents into particular category, due to the explosive growth of the textual information from the electronic digital documents as well as world wide web. In the text mining present challenge is to select important or relevant feature from large and vast amount of features in the data set. The aim o...

متن کامل

Text Document Classification: an Approach Based on Indexing

In this paper we propose a new method of classifying text documents. Unlike conventional vector space models, the proposed method preserves the sequence of term occurrence in a document. The term sequence is effectively preserved with the help of a novel datastructure called ‘Status Matrix’. Further the corresponding classification technique has been proposed for efficient classification of tex...

متن کامل

Text classification with sparse composite document vectors

In this work, we present a modified feature formation technique gradedweighted Bag of Word Vectors (gwBoWV) by (Vivek Gupta, 2016) for faster and better composite document feature representation. We propose a very simple feature construction algorithm that potentially overcomes many weaknesses in current distributional vector representations and other composite document representation methods w...

متن کامل

Improving Multi-Document Summarization via Text Classification

Developed so far, multi-document summarization has reached its bottleneck due to the lack of sufficient training data and diverse categories of documents. Text classification just makes up for these deficiencies. In this paper, we propose a novel summarization system called TCSum, which leverages plentiful text classification data to improve the performance of multi-document summarization. TCSu...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Advanced Computer Science and Applications

سال: 2023

ISSN: ['2158-107X', '2156-5570']

DOI: https://doi.org/10.14569/ijacsa.2023.0140332